# Full text of "Mathematical Contributions to the Theory of Evolution.--On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs"

## See other formats

```Mathematical Contributions to the Theory of Evolution. 489

was always less than the standard error when only x 2 was taken into
account, unless

Piz = 0.

We may now prove the similar, theorem that when we use three
variables, x 29 a? 3 , x h on which to base the estimate, the standard error
will be again decreased, unless

pu = 0.

The condition that S(% 2 ), in our present case, shall be less than
S(r 3 ) in the last, is, in fact,

f r 12 2 + r 13 2 + r u 2 - r ia 3 r 34 2 — %V 14 2 - r 13 ,V 24 2 1
J _ 2(r n r l4 ;r u + r V zr l zr23+r n r 1 4r 2i ) >(l-~r 23 2 )

I + 2 (r 12 r u r^r u + r u r l% r 2i r 2Z +r n r n r u r u ) J
> (rn+r u 2 -~2r n r l3 r 2Z ) (l—roJ~-r 2i 2 —r u 2 + 2r 23 r 2 4r 3 4).

This may be finally reduced to —

(r l4 — r 13 r 34 -~- r n r 2 i—r H r 2 ^+r n r^r 2i ^r 12 r 2d r u y > 0,
that is /> l4 3 > 0.

The treatment of the general case of n variables, so far as regards
obtaining the regressions, is obvious, and it is unnecessary to give it
at length.

We can now see that the use of normal regression formulas is quite
legitimate in all cases, so long as the necessary limitations of inter-
pretation are recognised. Bravais' r always remains a coefficient of
correlation. These results 1 must plead as justification for my use of
normal formulas in two cases^ where the correlation was markedly
non-normal.

" Mathematical Contributions to the Theory of Evolution. — On
a Form of Spurious Correlation which may arise when
Indices are used in the Measurement of Organs." By
Karl Pearson, F.R.S., University College, London. Ee-
ceived December 29, 1896,— Read February 18, 1897.

(1) If the ratio of two absolute measurements on the same or
different organs be taken it is convenient to term this ratio an index,

If u = jfi(aj, y) and v = / 2 (z, y) be two functions of the three variables
x, y, z y and these variables be selected at random so that there exists
no correlation between x,y> y,z, or z,x 9 there will still be found to

* ' Economic Journal/ Dec, 1895, and Bee, 1896, « On the Correlation of Total
Pauperism with Proportion of Out-relief."

490 Prof, Karl Pearson.

exist correlation between u and v. Thus a real danger arises when a.
statistical biologist attributes the correlation between two functions,
like u and v to organic relationship. The particular case that i?
likely to occur is when u and v are indices with the same denominator
for the correlation of indices seems at first sight a very plausible
measure of organic correlation.

The difficulty and danger which arise from the use of indices was
brought home to me recently in an endeavour to deal with a consider-
able series of personal equation data. In this case it was convenient
to divide the errors made by three observers in estimating a variable
quantity by the actual value of the quantity. As a result there
appeared a high degree of correlation between three series of abso-
lutely independent judgments. It was some time before I realised
that this correlation had nothing to do with the manner of judging,
but was a special case of the above principle due to the use of indices.

A further illustration is of the following kind. Select three num-
bers within certain ranges at random, say x, y, z, these will be pair
and pair uncorrected. Form the proper fractions xjy and zjy for
each triplet, and correlation will be found between these indices.

The application of this idea to biology seems of considerable
importance. For example, a quantity of bones are taken from an
ossuarium, and are put together in groups, which are asserted to be
those of individual skeletons. To test this a biologist takes the
triplet femur, tibia, humerus, and seeks the correlation between the
indices femur j humerus and tibia / humerus. He might reasonably
conclude that this correlation marked organic relationship, and
believe that the bones had really been put together substantially in
their individual grouping. As a matter of fact, since the coefficients
of variation for femur, tibia, and humerus are approximately equal,
there would be, as we shall see later, a correlation of about 0*4 to
0*5 between these indices had the bones been sorted absolutely at
random. I term this a spurious organic correlation, or simply a
spurious correlation. I understand by this phrase the amount of
correlation which would still exist between the indices, were the
absolute lengths on which they depend distributed at random.

It has hitherto been usual to measure the organic correlation of the
organs of shrimps, prawns, crabs, &c, by the correlation of indices in
which the denominator represents the total body length or total cara-
pace length. ISFow suppose a table formed of the absolute lengths
and the indices of, say, some thousand individuals. Let an "imp "
(allied to the Maxwellian demon) redistribute the indices at random,
they would then exhibit no correlation* if the corresponding absolute
lengths followed along with the indices in the redistribution, they
also would exhibit no correlation. £Tow let us suppose the indices
not to have been calculated, but the imp to redistribute the abso-

Mathematical Contributions to. the. Theory of Evolution. 491

lute lengths; these would now exhibit no organic correlation,, but
the indices calculated from this random distribution would have a
correlation nearly as high, if not in some cases higher than before.
The biologist would be not unlikely to argue that the index correla-
tion of the imp-assorted, but probably, from the vital standpoint,
impossible beings was " organic."

As a last illustration, suppose 1000 skeletons obtained by distribut-
ing component bones at random. Between none of their bones will
these individuals exhibit correlation. Wire the spuriousr , skeletons,
together and photograph them all, so that their stature in the photo-
graphs is the same ; the series of photographs, if measured, will show
correlation between their parts. It seems to me that tbe biologist
who reduces the parts of an animal to fractions of some one length
measured upon it is dealing with a series very much like these pho-
tographs. A part of the correlation he discovers between organs is
undoubtedly organic, but another part is solely due to, the nature of
his arithmetic, and as a measure of organic relationship is spurious.

Beturning to our problem of the randomly distributed bones, let
us suppose the indices femur/hum erus and tibia/humerus to have a
correlation of 0*45. Now suppose successively 1, 2, 3, 4, &c,
per cent, of the bones are assorted in their true groupings, then
begins the true organic correlating of the bones. It starts from 0*45,
and will alter gradually until 100 per cent, of the bones are truly
grouped. The final value may be greater or less than 0*45, but it
would seem that 0*45 is a more correct point to measure tbe organic
correlation from than zero. At any rate it appears fairly certain
that if a biologist recognised that a perfectly random selection of
organs would still lead to a correlation of organ-indices, be would
be unlikely to accept index-correlation as a fair measure of the rela-
tive intensity of correlation between organs. I shall accordingly
define spurious organic correlation as the correlation which will be
found between indices, when the absolute values of the organs have
been selected purely at random. In estimating relative correlation
by the hitherto usual measurement of indices, it seems to me that a
statement of the amount of spurious correlation ought always to be

(2) Proposition I. — To find the mean of an index in terms of the
means, coefficients of variation, and coefficient of correlation of the two
absolute measurements*

Let a?!, x 2 , # 3 , a? 4 be the absolute sizes of any four correlated organs ;
mi, m 2 , m 3 , mi their mean values ; <r l9 <r 2 > ^3, <** their standard deviations ;

** In all that follows, unless otherwise stated, the correlation may be of any kind
whatever, i.e., the frequencies are not supposed to follow the Gaussian or normal
law of error.

492 Prof. Karl Pearson.

■0i» ^2» ^3» ^4 their coefficients of variation, i.e, 9 <ri/«&i, o" 2 /m 2 , 03/^3,
<r 4 /m 4 respectively ; n 2 , r 23 , r 34 , r 41 , r 24 , ^13, the six coefficients of corre-
lation ; e l9 e 2 , e 3 , 64 the deviations of the four organs from their means,
i,e, a?x = mi + e l9 x% = m 2 -f-€ 2 , a? 3 = m 3 4-e 3 , #4 = m 4 -fe 4 ; % 3 the mean
value of the index xjx^ and ^ 24 the mean value of x 2 /xi ; 2 1? 2 2 the
standard deviations of the indices Xi/x 3 and # 2 /# 4 respectively ; and n
the total number of groups of organs.

We shall suppose the ratios of the deviations to the mean absolute
values of the organs are so small that their cubes may be neglected.

Then

w- \aj 3/ / wm 3 L\ mj/ \ m 3 / J

lmj S( ei ) S(«.) S(6 l£3 ) S(e 3 2 )\
nm 3 \ m x m 3 Wim 3 ?w 3 /

if we neglect quantities of the third order in e/m. But S(€j) =
S(g 3 ) = 0, S(6j6 3 ) = Wi3«Tiff 3 > and S(c 3 3 ) = ^er 3 2 .

Hence: . Ww, , , N /.\

*i3 = — (1-f-Va - — ns^i^) , CO-
Similarly . m 2 e . . * % ,. n

J * 24 = _i (l + v* 2 —^^) (11).

Thus we see that the mean of an index is not the ratio of the means
of the corresponding absolute measurements, but differs by a quan-
tity depending on the correlation and variation coefficients of the
absolute measurements.

(3) Proposition II — To find the standard deviation of an index in
terms of the coefficients of variation, and coefficient of correlation of
the two absolute measurements.

\% /

2

m? I \ mij \ m 3 l J

= — „ < S { h square terms >

m 3 L \mi m 3 /J

= ^i3 2 O^i 2 + n-Vs 2 — 2 nr^v^s) ,
if we neglect cubic terms.

. * . 2S I3 = iisv(y* + v^-~2ri\$ViV 3 ). (iii).

Mathematical Contributions to the Theory of Evolution. 493

(4) Proposition III. — To find the coefficient of correlation of two indices
in terms of the coefficients of correlation of the four absolute measurements
and their coefficients of variation.

Let Xijx-i and x 2 \x± be the two indices.
w/>2 18 224 = S ( —— -i 13 ) ( ~-~ i 2 *

m i^2 J,, e, € 3 e,e 3 e 3 2

1-j . 1 - — i— .^ +ruVivA

m 2 m 4 m 2 m 4 ?n> 4

\mi m 3 / \m 2 m 4 ,
if we neglect terms of the cubic order.

Hence, finally,

r 1 gglgg -~ r J 4^4 — ^3^3 + ^4^4

V Vl 2 + ^ 3 2 — 2 riai;^ ^ v* + v* — 2 r 2i v 2 Vt

(5) Thus we have expressed p in terms of the four coefficients of
correlation and the four coefficients of variation of the absolute
measurements which form the indices.

We may draw the following conclusions :

(i.) The correlation between two indices will always vanish when
the four absolute measurements forming the indices are quite uncor-
related,

(ii.) If two of the organs are perfectly correlated, let us say made
identical : for example, the third and fourth, so that r 34 = 1, and v 3 =
t% we find

r n v iVj — r 13 ^ 3 — gg gaga ± v*

\ / v 1 2 + v<?—2r li v l v. <i */v2 2 + v 3 2 —2r 2 3V 2 Vz

This is the coefficient of correlation between two indices with the
same denominator (xi/x^ and #2/^3) ♦

The value of p in (v) does not vanish if the remaining organs be
quite uncorrelated, i.e., r l2 = n 3 = r K = 0. In this case

This is the measure of the spurious correlation. For the special

494 Prof. Karl Pearson.

case in which the coefficients of variation are all the same, p = 0*5.

When the absolute sizes of organs are very feebly correlated, then

in most cases there will be a considerable correlation of indices.

Example (&). Suppose three organs, x u x%, and x % to have sensibly

equal coefficients of variation, and that the correlation of Xi and x 2 =

r n = r and of x x and x s , as well as of x 2 and x 3 = r'.

Then:

1+r— 2r'

0*5 X

0*5 + 0*5

l-V
r— r

1— r'

This formula illustrates well in a specially simple case how the
correlation in the indices diverges from the spurious value 0*5, as we
alter r and r from zero, i.e., as we introduce organic correlation.
According as r, the correlation of the numerators, is greater or less
than r , the correlation of the numerator with the denominator, the
actual index correlation can be greater or less than the spurious
value.

Example (b). If z h z 2 be the indices, then in the case of normal
correlation the contour lines of the correlation surface for the indices
are given by

hzrr? ^ = constant,

2 1 1 (l-/>*) 2,2,(1-/,*) 2/(1-^)

where 2 b 2 2 , and p are given by (iii) and (iv) above.

The contour lines of a surface of spurious index correlation are
given by

— 1-j — - —2 (-— H — = = constant,

while the uncorrected distribution of the numerators x x and x % is
given by the contours,

%il<r? + %£lcr£ = constant.

We are thus able to mark the growth of the spurious correlation
as we increase v 3 from zero ; we see the axes of the ellipses diminishing
and their directions beginning to rotate.

Example (c). To find the spurious correlation between the two chief
cephalic indices.

I have calculated the following results from the measurements
made on 100 " Altbayerisch " S skulls, by Professor J. Ranke. See
his * Anthropologic der Bayern,' Bd. i, Kapifcel v, S. 194.

Mathematical Contributions to the Theory of Evolution. 495

Breadth of skull :* m x r= 150*47, c x == 5*8488, v x = 3*8871.

Height of skull : m 2 = 133*78, <r 2 = 4'6761, v 2 = 3*4954.

Length of skull : m z = 180*58, <x 3 =- 5*8441, v 3 = 3*2363.

Cephalic index, B/L : t 13 = 83*41, 2 l3 == 3*5794, Y 13 .== 4*2913.

Cephalic index, H/L : £>3 = 74*23, S 33 = 3*6305, V 23 = 4*8909.

Cephalic index, H/B : ^ = 89*12, 2 21 = 4*l752, t V 2 i =4*6849.

The coefficients of correlation may at once be deduced :

Breadth and length : r 13 = (vf+v 3 2 -- Vi3 2 )/(2tf 2 v 3 ) .=. 0*2849.
Height and length : r 23 = (tf 2 2 4-^3 2 — V 23 2 )/(2v 2 v 3 ) = —0*0543.
Height and breadth : r^ = (^ 2 2 4-^i 2 — V 2 i 2 )/(2v 1 v 2 ) = 0*1243.,

This is the first table, so far as I am aware, that has been published
of the variation and correlation of the three chief cephalic lengths.f
It shows us that there is not at all a close correlation between these
chief dimensions of the skull, and that a small compensating factor
for size is to be sought in the correlation of height and length, i.e.,
while a broad skull is probably a long skull and also a high skull, a
high skull will probably be a short skull, and a low skull a long skull.

Without substituting the values of v h v 2 , v h r n , r 13 , r 33 in
(v), we can find p, or the correlation between bread th/lengfch and
height/length indices from :

P = (V 13 2 +V 23 *-V 12 2 )/(2V 13 V 23 ).

This follows at once from the general theorem given in my memoir

on " Regression, Panmixia, and Heredity," ' Phil. Trans.,' vol. 187,

A, p. 279, or by substitution of the above values of r 12 , r m , r 23 in (v),

we find t

p = 0*4857.

If we calculate from (vi) the correlation between the same cephalic
indices on the hypothesis that their heights, breadths and lengths
are distributed at random, i.e., that our "Imp" has constructed a
number of arbitrary and spurious skulls from Professor Ranke's
measurements, we find :

p = 0*4008.

It seems to me that a quite erroneous impression would be formed
of the organic correlation of the human skull, did we judge it by the
magnitude of the correlation coefficient (0:4857) for the two chief

* All the absolute measures given are in millimetres, and the coefficients of
variation are percentage variations, i.e., they must be divided by 100 before being
used in formulae (i), (ii), and (iii).

f I hope later to treat correlation in man with reference to race s sex, and
organ, as I have treated variation.

496 Prof. Karl Pearson.

cephalic indices, for no less than 0*4008 of this would remain, if we
destroyed all organic relationship between the lengths on which these
indices are based.

Example (d). To find the spurious correlation behveen the indices
femur J humerus and femur j tibia.

The following results have been calculated* from measurements
made by Koganei on Aino skeletons. (See ' Mittheilungen aus der
medicinischen Facultat der K. J. Universitat, Tokio,' Bd. I. Tables.)

I have kept the sexes apart although there are but few of each.

6* Skeletons. Number = 40 to 44. Measurements in centimetres.

Femur, F : m x = 40*845, <r x = 1*957, v x = 4*792.

Tibia, T : %= 31*740, a 2 = 1*577, v 2 = 4*970.

Humerus, H : m^ = 29*593, <r 3 = 1*337, v 3 = 4*517.

The following coefficients of correlation were calculated directly ;

Femur and tibia : r X2 =' 0*8266.
Femur and humerus : r 13 = 0*8585.
Tibia and humerus : r 23 == 0*7447.

From these were deduced by the formulse of this paperf : — >

Index, F/T : i l2 = 128*75, S ia = 3*7075. Y 13 = 2*8795.
Index, F/H : i 13 = 137*92, 2 18 = 3*4084, Y 13 = 2*4714.
Index, T/H: i 2Z = 107'02, 2 23 = 3*6675, Y 23 = 3*4271.

Hence we find for the correlation of the indices F/H and T/H :

p = 0*5644.

But the spurious correlation, if the bones had been grouped at
random would have been

Po = 0*4557.

* I have to thank Miss Alice Lee for a considerable part of the arithmetic work
of this example.

f The values for the indices are not in absolute agreement with those to be
deduced from the lengths, for it was not always possible to use the same skeleton
for femur and humerus as for tibia and humerus, i.e., sometimes one or other bone
was missing. For the same reason, the constants for the absolute lengths do not
agree entirely with those given for Ainos in my paper on " Variation in Man and
Woman" in * The Chances of Death and other Studies in Evolution,* vol. 1,
p. 303), for the simple reason that I there used every available bone, and not every
available pair, as here.

Mathematical Contributions to the Theory of Evolution. 497

Tabulating the corresponding quantities for the other sex^ -we
find

'<j> Skeletons. Number = 22 to 24. Measurements in centimetres.

Femur, P : m x — 38*075, a Y = 1*494, v x — 3*924.

Tibia, T : m 2 = 29*800, <x 3 = 1*576, v 2 = 5*289.

Humerus, H : m 3 = 27*565, <r 3 = 1*109, v 5 = 4*022.

Femur and tibia : r 12 = 0*8457.
Femur and humerus : r J3 =. 0*8922.

Tibia and humerus : r 23 = 0*7277.

Index, P/T : * 12 = 127*90, 2 12 = 3*8937, V l2 = 3*0444.

Index, P/H : i 13 = 138*37, S 18 = 2*6930, Y 13 = 1*9462.

Index, T/H : z 23 = 108*36, 2 23 = 4*1022, V i3 = 3*7857.

^ = 0*6006.
/Jo = 0*3904.

Hence we may conclude as follows :

(i) The absolute lengths of the long bones differ from those of the
skull in being very closely correlated.

(ii) The use of indices for the long bones would appear to mini-
mise, rather than, as in the case of the skull, to exaggerate this
correlation.

(iii) If we measure, however, organic correlation of the indices by
p—pQ, we shall find index correlation less than absolute length corre-
lation for both long bones and skull, and in both cases the former
comparatively small as compared with the latter.

(iv) The results for the 24 female skeletons, although based on
but few data, serve on the whole to confirm the male results.*

(6.) Prom the above examples it will be seen that the method,
which judges of the intensity of organic correlation by the reduction
of all absolute measures to indices, the denominators of which are
some one absolute measurement, is not free from obscurity ; for this
method would give the major portion of the observed index corre-
lation had the parts of the animal been thrown together entirely at
random, i.e., if there were no organic correlation at all. The follow-
ing additional remarks may be of interest. The results (iv) — (vi)
show us that the correlation coefficients of indices are functions, not
only of the correlation coefficients of absolute measurements, bat also
of the coefficients of variation of the latter measurements. Hence,

* The fact that the male is more variable in height- sitting, in femur, and in
tibia than the female, while she appears to be more variable than he is in stature,
led me to prophesy, in my paper on " Variation in Man and Woman," that the
female would be found to be more closely correlated in the bones forming stature
than the male. This appears to be the case for the femur and tibia of Ainos.
VOL. LX. 2 P

498 Dr. F. Galton. Note to the Memoir by

unless the coefficients of variation be constant for local races, it is
impossible that the coefficients of correlation can be constant for
indices. In other words, the hypothesis of the constancy for local
races of correlation, and that of the constancy for local races of
variation, stand on exactly the same footing.

The conclusions of this paper although applied to organic correla-
tion are equally valid so far as concerns the use of indices in judging
the correlation of either physical or economic phenomena. It was,
indeed, a difficulty arising from my discussion of personal judgments
— a spurious correlation between the judgments of different observers
— which first drew my attention to the matter.

Note, January 13, 1897. — The result described by Professor
Pearson evidently affects the value of the correlation coefficients
determined by me in Grangon and Carcinus (' Roy. Soc. Proc./ vols*
51 and 54), because I have always expressed the size of the organs
measured in terms of body length.

In order to show the effect of this, I have lately performed, at
Professor Pearson's suggestion, the following experiment : It happens
that my measures of Plymouth shrinrps are recorded in a book, in
the order in which they were measured, and therefore at random as
regards carapace length or other characters. I constructed from
these records 420 "spurious" shrimps, in the following way: the
total length of the first shrimp in the book was associated with the
carapace length of the tenth shrimp and the " posfc-spinous length "
of the twentieth, and so throughout. Evidently these three measures
were associated at random, and we might expect that these spurious
shrimps would show no organic correlation ; but w r hen the cara-
pace lengths and " post-spinous lengths " of these spurious shrimps
were divided by the body length, and the correlation between the
resulting indices was determined, the value of r was found to be 0'S8 7
the value for real shrimps being 0*81, or the correlation due to the
use of indices forms 47 per cent, of the observed value.

W. P. R. Weldok.

" Note to the Memoir by Professor Karl Pearson, F.K.S., on
Spurious Correlation." By Francis GrALTON, F.R.S. Re-
ceived January 4, — Read February 18, 1897.

I send this note to serve as a kind of appendix to the memoir of
Professor K. Pearson, believing that it may be useful in enabling
others to realise the genesis of spurious correlation. It is important
though rather difficult to do so, because the results arrived at in the
memoir, which are of serious interest to practical statisticians, have
at first sight a somewhat paradoxical appearance.

```